2. Tests for Stationarity

How do you check if a time seirs is stationary?

  • Visual inspection for constant mean, constant variance

  • KPSS test (\(H_0\): Stationary)

  • ADF test (\(H_0\): Not Stationary)

  • PP test (\(H_0\): Not Stationary)

  • Fit AR(1), and see if \(\hat \phi_1\) is significantly different from 1.



KPSS test

Kwiatkowski-Phillips-Schmidt-Shin (1992) test for

  • Default choice in auto.arima() \[ \left\{ \begin{array}{ll} H_0: X_t \mbox{ is trend stationary}\\ H_A: X_t \mbox{ is not stationary} \\ \end{array} \right. \]

  • Large p-value means “Stationary”.

  • Decompose the series as \[ X_t \hspace{3mm} = \hspace{3mm} T_t + W_t + Y_t \] where \(T_t\) is deterministic trend, \(W_t\) is random walk, and \(Y_t\) is stationary error.

  • Lagrange multiplier test that the random walk has zero variance.


ADF test

Augumented Dickey-Fuller test (Brockwell p. 194)

  • “Unit-root” test

  • The stationarity hypothesis \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ is not stationary} \\ H_A: Y_t \mbox{ is stationary} \end{array} \right. \]

  • Is replaced by \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ has unit root} \\ H_A: Y_t \mbox{ does not have unit root } \end{array} \right. \]

  • Small p-value rejects the null hypothesis of non-stationarity, and means “Stationary”.

  • Fit AR(1) to a time series, and test if \(\phi_1\) is significantly different from 1, by \[ \frac{ \hat \phi_1 -1 }{ SE(\hat \phi_1) } \sim N(0,1) \]


PP test

Phillips-Perron test for the null hypothesis that x has a unit root.

  • Same Hypothesis as ADF test

  • Small p-value means “Stationary”.

  • The tests are similar to ADF tests, but they incorporate an automatic correction to the DF procedure to all ow for autocorrelated residuals.

  • Calculation of the test statistics is complex.

  • The tests usually give the same conclusions as the ADF tests



P-value of (KPSS, ADF, PP) tests

* (LARGE, small, small) \hspace{3mm} $\to$ All three indicating stationarity.

* (small, LARGE, LARGE) \hspace{3mm} $\to$ All three indicating non-stationarity. 

* (anything else) \hspace{3mm} $\to$ Conflicing conclusions, inconclusive. 

  • Stationarity.tests() performs all of the three tests.



Ex: Use it on above datasets

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
  • Large, small, small \(\to\) stationary. As it should be.



## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
  • small, Large, small \(\to\) conflicting. First two is indicating Non-Stationarity, last one is indicating stationarity.



## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.706 0.01
  • small, Large, Large \(\to\) All three indicating non-stationarity.



Ex: Australian Steel Data

## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.408 0.01

## Warning in pp.test(A): p-value smaller than printed p-value

## Warning in pp.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.049 0.01
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS  ADF   PP
## p-val: 0.01 0.01 0.01
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS   ADF   PP
## p-val:  0.1 0.046 0.01



3. Modeling Lake HURON data 4 different ways



a) Direct ARMA fit

If we treat it as “stationary” series, we can try to fit ARMA(p,q) series.

## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS  ADF    PP
## p-val: 0.01 0.24 0.025
## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 estimated as 0.4784:  log likelihood=-101.09
## AIC=210.18   AICc=210.62   BIC=220.48

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20   JB    SD
## [1,] 0.963 0.952 0.934 0.567 0.641 0.89 0.684

Analysis 1 (direct fit)

  • auto.arima() chooses ARMA(1,1) with min AICc.

  • AR(1,1) with constant mean was fit directly to data. With observation \(Y_t\),

\[{\large Y_t \hspace{3mm} = \hspace{3mm} \mu + X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} }\] \[{\large e_t \sim WN(0, \sigma^2) }\]

## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 estimated as 0.4784:  log likelihood=-101.09
## AIC=210.18   AICc=210.62   BIC=220.48







b) Forcing Linear Trend

If we choose that the level is going down linearly, we can try to fit a line with ARMA errors.

## 
## Call:
## lm(formula = Lake ~ time(Lake))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50919 -0.74760 -0.01556  0.75966  2.53409 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 55.278141   7.922614   6.977 4.02e-10 ***
## time(Lake)  -0.024071   0.004119  -5.843 7.16e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared:  0.2644, Adjusted R-squared:  0.2566 
## F-statistic: 34.14 on 1 and 95 DF,  p-value: 7.165e-08

## Series: Reg2$residuals 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1     ma1
##       0.6671  0.3827
## s.e.  0.0937  0.1135
## 
## sigma^2 estimated as 0.452:  log likelihood=-98.72
## AIC=203.44   AICc=203.7   BIC=211.16

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25 ML15  ML20    JB    SD
## [1,] 0.985 0.974 0.969 0.19 0.189 0.782 0.669

Analysis 2 (linear trend)

  • Linear trend was fit to \(Y_t\), then the residuals were fitted with ARMA(1,1).

  • We observe $ Y_t = a+b t + X_t $ \(X_t\) is ARMA

  • In other words,

\[{\large Y_t \hspace{3mm} = \hspace{3mm} a + bt + X_t }\]

\[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} }\]

\[{\large e_t \sim WN(0, \sigma^2) }\]

## 
## Call:
## lm(formula = Lake ~ time(Lake))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50919 -0.74760 -0.01556  0.75966  2.53409 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 55.278141   7.922614   6.977 4.02e-10 ***
## time(Lake)  -0.024071   0.004119  -5.843 7.16e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared:  0.2644, Adjusted R-squared:  0.2566 
## F-statistic: 34.14 on 1 and 95 DF,  p-value: 7.165e-08
## Series: Reg2$residuals 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1     ma1
##       0.6671  0.3827
## s.e.  0.0937  0.1135
## 
## sigma^2 estimated as 0.452:  log likelihood=-98.72
## AIC=203.44   AICc=203.7   BIC=211.16







c) Box-Cox Differencing (ARIMA modeling)

Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
## Series: diff.Y 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.981 0.958 0.946 0.551 0.545 0.575 0.681
## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.979 0.956 0.942 0.524 0.495 0.577 0.677

Analysis 3 (take difference)

  • Difference of \(Y_t\) was taken. The difference seems to be ARMA(1,2).

  • In other words, $ Y_t $ is ARIMA(1,1,2), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} + \theta_2 e_{t-2} }\] \[{\large e_t \sim WN(0,\sigma^2) }\]

## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02







d) Box-Cox Differencing (ARIMA modeling)

Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.176 0.175 0.132 0.172 0.117 0.384 0.737
## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.5383:  log likelihood=-106.49
## AIC=214.98   AICc=215.02   BIC=217.54

Analysis 4 (take difference)

  • Difference of \(Y_t\) was taken. The difference seems to be WN.

  • In other words, $ Y_t $ is ARIMA(0,1,0), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} e_t }\] \[{\large e_t \sim WN(0,\sigma^2) }\]

## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.5383:  log likelihood=-106.49
## AIC=214.98   AICc=215.02   BIC=217.54





Summary

Tests are availabe in R, can be implemented with below command.

p-value of (Large, Small, Small) indicates stationarity.

Classical method tries to identify trend, and tries to removeit by polynomial regression.

Box-Jenkins Method tries to take a difference between today’s data and yesterday’s by applying \(\bigtriangledown=(1-B)\) operator.

Applying B-J method will often times stationarize non-stationary time series.

We can use ARMA(\(p,q\)) model to model the stationarized version of the series.

If you use \(\bigtriangledown\) once, and apply ARMA(1,2), then it is called ARIMA (1,1,2) model.






TS Class Web PageR resource page